On-Line EM Reinforcement Learning

نویسندگان

  • Junichiro Yoshimoto
  • Shin Ishii
  • Masa-aki Sato
چکیده

In this article, we propose a new reinforcement learning (RL) method for a system having continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, which is the expected future return for the current state-action pair. The actor tries to approximate a stochastic soft-max policy defined by the Q-function. The soft-max policy is more likely to select an action that has a higher Q-function value. The on-line EM algorithm is used to train the critic and the actor. We apply this method to two control problems. Computer simulations show that our method is able to acquire fairly good control in the two tasks after a few learning trials.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning Based on On-Line EM Algorithm

In this article, we propose a new reinforcement learning (RL) method based on an actor-critic architecture. The actor and the critic are approximated by Normalized Gaussian Networks (NGnet), which are networks of local linear regression units. The NGnet is trained by the on-line EM algorithm proposed in our previous paper. We apply our RL method to the task of swinging-up and stabilizing a sing...

متن کامل

EM for Perceptual Coding and Reinforcement Learning Tasks

The paper presents an algorithm for an EM-based reinforcement-driven clustering. As shown here it is applicable to the reinforcement learning setting with continuous state/discrete action space. E-step of the algorithm computes the posterior given the data and the reinforcement. Although designed to discover intrinsic states, the algorithm performs action selection without explicit state identi...

متن کامل

Parameterized Logic Programs where Computing Meets Learning

In this paper, we describe recent attempts to incorporate learning into logic programs as a step toward adaptive software that can learn from an environment. Although there are a variety of types of learning, we focus on parameter learning of logic programs, one for statistical learning by the EM algorithm and the other for reinforcement learning by learning automatons. Both attempts are not fu...

متن کامل

Using EM for Reinforcement Learning

We discsus Hinton’s (1989) relative payoff procedure (RPP), a static reinforcement learning algorithm whose foundation is not stochastic gradient ascent. We show circumstances under which applying the RPP is guaranteed to increase the mean return, even though it can make large changes in the values of the parameters. The proof is based on a mapping between the RPP and a form of the expectation-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000